16 research outputs found

    Genetic Algorithm Based Model in Text Steganography

    Get PDF
    Steganography is an ancient art. It is used for security in open systems. It focuses on hiding secret messages inside a cover medium. The most important property of a cover medium is the amount of data that can be stored inside it without changing its noticeable properties. There are many sophisticated techniques with which to hide, analyze, and recover that hidden information. This paper discusses an exploration in the use of Genetic Algorithm operators on the cover medium. We worked with text as the cover medium with the aim of increasing robustness and capacity of hidden data. Elitism is used for the fitness function. The model presented here is applied on text files, though the idea can also be used on other file types. Our results show this approach satisfied both security and hiding capacity requirements. Furthermore, we found that an increase in the size of the secret message resulted in an exponential increase in the size of the generated cover text. We also found a close relationship between the size of the chromosome used and the population size

    The SAWA corpus: a parallel corpus English-Swahili

    Get PDF
    Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the source language to the corresponding units in the target language. An aligned parallel corpus therefore facilitates the automatic development of a machine translation system and can also bootstrap annotation through projection. In this paper, we describe data collection and annotation efforts and preliminary experimental results with a parallel corpus English- Swahili.

    Unsupervised induction of Dholuo word classes using maximum entropy learning

    No full text
    Abstract. This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of word classes for the Western Nilotic language of Dholuo. The proposed approach extracts shallow morphological and contextual features for each word of a 300k text corpus of Dholuo. These features provide a layer of linguistic abstraction that enables the extraction of general word classes. We provide a preliminary evaluation of the proposed method in terms of language model perplexity and through a simple case study of the paradigm of the verb stem \somo".

    Model Ensembles of Artificial Neural Networks and Support Vector Regression for Improved Accuracy in the Prediction of Vegetation Conditions and Droughts in Four Northern Kenya Counties

    No full text
    For improved drought planning and response, there is an increasing need for highly predictive and stable drought prediction models. This paper presents the performance of both homogeneous and heterogeneous model ensembles in the satellite-based prediction of drought severity using artificial neural networks (ANN) and support vector regression (SVR). For each of the homogeneous and heterogeneous model ensembles, the study investigates the performance of three model ensembling approaches: (1) non-weighted linear averaging, (2) ranked weighted averaging, and (3) model stacking using artificial neural networks. Using the approach of “over-produce then select”, the study used 17 years of satellite data on 16 selected variables for predictive drought monitoring to build 244 individual ANN and SVR models from which 111 models were automatically selected for the building of the model ensembles. Model stacking is shown to realize models that are superior in performance in the prediction of future drought conditions as compared to the linear averaging and weighted averaging approaches. The best performance from the heterogeneous stacked model ensembles recorded an R2 of 0.94 in the prediction of future (1 month ahead) vegetation conditions on unseen test data (2016–2017) as compared to an R2 of 0.83 and R2 of 0.78 for ANN and SVR, respectively, in the traditional approach of selection of the best (champion) model. We conclude that despite the computational resource intensiveness of the model ensembling approach, the returns in terms of model performance for drought prediction are worth the investment, especially in the context of the continued exponential increase in computational power and the potential benefits of improved forecasting for vulnerable populations
    corecore